feat: add benchmark framework for collection mount performance by chirag-bruno · Pull Request #7915 · usebruno/bruno

chirag-bruno · 2026-05-05T14:10:32Z

Description

Playwright benchmark tests measuring collection mount time across bru/yml formats and sizes (50-5000 requests)
IPC listener approach for precise mount-complete signal
Generic benchmark utils: stats, results I/O, baseline comparison, PR commenting
Collection generator using @usebruno/filestore serializers
CI workflow running on ubuntu, macos, and windows with PR comment reporting
Regression detection against committed baselines with configurable threshold

Contribution Checklist:

I've used AI significantly to create this pull request
The pull request only addresses one issue or adds one feature.
The pull request does not introduce any breaking changes
I have added screenshots or gifs to help explain the change if applicable.
I have read the contribution guidelines.
Create an issue and link to the pull request.

Note: Keeping the PR small and focused helps make it easier to review and merge. If you have multiple changes you want to make, please consider submitting them as separate pull requests.

Publishing to New Package Managers

Please see here for more information.

Summary by CodeRabbit

New Features
- Added performance benchmarking system for measuring application performance across Ubuntu, macOS, and Windows.
- Automated benchmark comparison against established baselines with regression detection.
- Added test:benchmark npm script for running performance tests.
Chores
- Updated project configuration to support benchmark test execution.
- Added benchmark result directories to .gitignore.

coderabbitai · 2026-05-05T14:10:48Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

This PR introduces a comprehensive benchmarking infrastructure for measuring collection mount performance across operating systems. It adds GitHub Actions workflows, Playwright benchmark configuration, collection generation utilities, statistical helpers, result comparison tooling, PR integration, and baseline metrics for Ubuntu, macOS, and Windows.

Changes

Benchmark Infrastructure & Test Suite

Layer / File(s)	Summary
Workflow & Action Configuration `.github/workflows/benchmarks.yml`, `.github/actions/tests/run-benchmark-tests/action.yml`, `playwright.benchmark.config.ts`, `package.json`, `.gitignore`	GitHub Actions workflow runs benchmarks on a matrix of operating systems with conditional steps for updating vs. comparing baselines. Composite action abstracts benchmark execution with OS-specific runner logic. Playwright config isolates benchmarks to a single worker with extended timeouts and JSON reporting. npm script and ignore patterns complete the CI/test infrastructure.
Benchmark Utility Modules `tests/benchmarks/utils/stats.ts`, `tests/benchmarks/utils/results.ts`, `tests/benchmarks/utils/collection-generator.ts`	Statistical functions compute mean, median, percentiles, and standard deviation. Results I/O handles JSON serialization/deserialization with typed schemas. Collection generator produces Bruno/OpenCollection fixtures on disk at configurable sizes and formats for repeatability.
Benchmark Test Implementation `tests/benchmarks/mounting/collection-mount.bench.ts`	Playwright test measures collection mount timing across multiple sizes and formats by instrumenting Electron dialog mocking, IPC event listening, and performance.now() measurement. Aggregates per-configuration statistics and writes results to JSON.
Result Processing & Reporting `tests/benchmarks/utils/compare.js`, `tests/benchmarks/utils/pr-comment.js`	compare.js CLI validates results against baseline with configurable regression threshold, supports baseline update mode. pr-comment.js constructs and posts/updates GitHub PR comments showing per-benchmark percentage changes and regression status.
Baseline Data `tests/benchmarks/mounting/baseline.*.json`	Platform-specific baseline files (Ubuntu, macOS, Windows) define regression thresholds and per-benchmark mean/p50 targets for detection of performance regressions.

Sequence Diagram

sequenceDiagram
    participant GHA as GitHub Actions
    participant CAct as Composite Action
    participant Pw as Playwright Runner
    participant Bench as Benchmark Test
    participant CmpJS as compare.js
    participant CommentJS as pr-comment.js
    participant GHApi as GitHub API

    GHA->>CAct: Trigger with os & update-baseline
    CAct->>Pw: Run benchmarks (with xvfb on Linux)
    Pw->>Bench: Execute mount tests
    Bench->>Bench: Measure timing, generate results.json
    CAct->>CmpJS: Load results & baseline
    alt update-baseline == true
        CmpJS->>CmpJS: Compute new baseline from results
        CmpJS->>Pw: Write baseline.json
    else compare mode
        CmpJS->>CmpJS: Compare against baseline
        CmpJS->>CmpJS: Check for regressions
    end
    GHA->>GHA: Upload results & baseline artifacts
    alt Pull Request event
        GHA->>CommentJS: Load results & baseline
        CommentJS->>CommentJS: Build comparison table
        CommentJS->>GHApi: Post/update PR comment
        GHApi->>GHApi: Display benchmark results
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

Suggested labels

perf

Suggested reviewers

helloanoop
lohit-bruno
naman-bruno

Poem

🚀 Benchmarks take flight across every OS,
Collection mounts timed with statistical finesse,
Baselines established, regressions in sight,
Performance metrics shining bright—
Speed insights for all, PR comments in flight! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 16.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title 'feat: add benchmark framework for collection mount performance' accurately and concisely summarizes the primary change: introducing a complete benchmarking framework for measuring collection mount performance.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (5)

tests/benchmarks/mounting/collection-mount.bench.ts (2)

81-81: ⚡ Quick win

Replace page.waitForTimeout(500) with a deterministic wait.

Per coding guidelines, page.waitForTimeout() should only be used when no suitable expect() locator-based wait is available.

If closeAllCollections leaves a detectable UI change (e.g., the sidebar collection list becomes empty), prefer:

-          await page.waitForTimeout(500);
+          await expect(page.locator('#sidebar-collection-name')).toHaveCount(0);

If there's genuinely no stable locator to await after close, add a brief comment justifying the timeout.

As per coding guidelines: "Try to reduce usage of page.waitForTimeout(); in code unless absolutely necessary and the locator cannot be found using existing expect() playwright calls."

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/benchmarks/mounting/collection-mount.bench.ts` at line 81, Replace the
non-deterministic sleep (page.waitForTimeout(500)) in the collection-mount
benchmark with a locator-based expect or a justified comment: after calling
closeAllCollections() wait for a concrete UI change (e.g.,
expect(sidebarCollectionList).toBeEmpty() or
expect(collectionItem.locator('text=...')).not.toBeVisible()) that signals
completion; if there truly is no stable DOM change to await, keep a very brief
comment explaining why a fixed timeout is required and reduce the timeout to the
minimum necessary while noting the reason.

70-95: ⚡ Quick win

Wrap test body steps with test.step for clearer reports.

The inner test body performs distinct phases — generate, measure (× N), record, log — but has no test.step wrapping, making failure traces harder to read.

♻️ Example restructuring with test.step

         test(`mount ${format} collection with ${size} requests`, async ({ page, electronApp, createTmpDir }) => {
           test.setTimeout((2 + Math.ceil(size / 100) * 2) * 60_000);
           const timings: number[] = [];

           for (let i = 0; i < ITERATIONS_PER_SIZE; i++) {
-            const collectionName = `bench-${format}-${size}-iter-${i}`;
-            const collectionDir = await createTmpDir(`bench-${format}-${size}-${i}`);
-            generateCollection({ dir: collectionDir, name: collectionName, requestCount: size, format });
-
-            const elapsed = await measureCollectionMount(page, electronApp, collectionDir, collectionName);
-            timings.push(Math.round(elapsed));
-            await page.waitForTimeout(500);
+            await test.step(`iteration ${i + 1}`, async () => {
+              const collectionName = `bench-${format}-${size}-iter-${i}`;
+              const collectionDir = await createTmpDir(`bench-${format}-${size}-${i}`);
+              await test.step('generate collection', () => {
+                generateCollection({ dir: collectionDir, name: collectionName, requestCount: size, format });
+              });
+              const elapsed = await test.step('measure mount', () =>
+                measureCollectionMount(page, electronApp, collectionDir, collectionName)
+              );
+              timings.push(Math.round(elapsed));
+            });
           }

As per coding guidelines: "Promote the use of test.step as much as possible so the generated reports are easier to read."

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/benchmarks/mounting/collection-mount.bench.ts` around lines 70 - 95,
The test body for the `test("mount ${format} collection with ${size} requests",
...)` case should be split into explicit Playwright `test.step` calls to improve
report clarity: wrap the collection generation loop (calls to
`generateCollection` and `createTmpDir`) in a "generate collections" step, wrap
each measurement call to `measureCollectionMount` (and the per-iteration wait)
inside a "measure mount (iteration i)" step or a single "measure mounts" step,
and wrap result aggregation/annotation (uses of `timings`, `resultKey`,
`summarize`, and `test.info().annotations.push`) and the final console log into
a "record results" step; update the test body (the anonymous async function
passed to `test(...)`) to call `await test.step(name, async () => { ... })`
around those phases so failures show as separate steps while preserving existing
logic and variables.

tests/benchmarks/utils/compare.js (1)

62-62: ⚡ Quick win

baseline.collections fallback is undocumented dead code.

The baseline schema only defines entries; the baseline.collections fallback has no corresponding writer and will silently swallow a missing entries key instead of failing loudly.

-const baselineEntries = baseline.entries || baseline.collections || {};
+const baselineEntries = baseline.entries || {};

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/benchmarks/utils/compare.js` at line 62, Replace the undocumented
fallback that uses baseline.collections with a strict requirement for
baseline.entries: change the initialization of baselineEntries (the variable
named baselineEntries) to only read baseline.entries and add a guard that throws
(or returns a clear error) when baseline.entries is missing so the function
fails loudly instead of silently using a dead-path; this ensures the code path
in compare.js validates the expected schema rather than swallowing a missing
entries key.

tests/benchmarks/utils/pr-comment.js (1)

29-33: ⚡ Quick win

pct is a string — parse it before comparing to numbers.

.toFixed(1) returns a string, so all comparisons on lines 30–31 and 33 (pct > threshold, pct < -threshold, pct > 0) silently rely on JS coercion. When base.mean is 0, pct becomes "Infinity" (or potentially "NaN") and the coercion is unreliable. Keep pct numeric; format only at the point of interpolation.

♻️ Suggested fix

-  const pct = ((data.mean - base.mean) / base.mean * 100).toFixed(1);
-  const status = pct > threshold ? '🔴 REGRESSION' : pct < -threshold ? '🟢 IMPROVED' : '✅ OK';
-  if (pct > threshold) hasRegression = true;
-
-  body += `| ${key} | ${Math.round(data.mean)} | ${base.mean} | ${pct > 0 ? '+' : ''}${pct}% | ${status} |\n`;
+  const pct = (data.mean - base.mean) / base.mean * 100;
+  const pctStr = pct.toFixed(1);
+  const status = pct > threshold ? '🔴 REGRESSION' : pct < -threshold ? '🟢 IMPROVED' : '✅ OK';
+  if (pct > threshold) hasRegression = true;
+
+  body += `| ${key} | ${Math.round(data.mean)} | ${base.mean} | ${pct > 0 ? '+' : ''}${pctStr}% | ${status} |\n`;

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/benchmarks/utils/pr-comment.js` around lines 29 - 33, pct is being set
to a string via .toFixed(1) and then compared numerically; change the code so
pct is a numeric value used for comparisons (compute pct = (data.mean -
base.mean) / base.mean * 100 without .toFixed), perform numeric comparisons
against threshold and -threshold (affecting the checks using pct in the
hasRegression assignment and status calculation), and only format pct with
toFixed(1) when building the output interpolation for body (e.g., use a
formattedPct variable for the string). Ensure the same named symbols are
updated: pct (numeric), hasRegression, status, and the body interpolation where
`${pct}` is inserted.

tests/benchmarks/mounting/baseline.json (1)

1-46: ⚡ Quick win

Placeholder baseline values — bru-* and yml-* entries are identical, which may mask format-specific regressions.

All ten entries share the same mean/p50 figures (e.g., both bru-50 and yml-50 have mean: 2000). Committing equal baselines for two formats that likely have distinct I/O characteristics means the comparison is not meaningful until real numbers replace these. Consider running the suite once on a reference machine and replacing these with the actual measured medians before merging, or at minimum add a prominent "status": "placeholder" field and fail CI baseline comparison when that flag is present.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/benchmarks/mounting/baseline.json` around lines 1 - 46, The baseline
file currently uses placeholder identical values for all entries (see entries
"bru-50"..."bru-5000" and "yml-50"..."yml-5000"), which masks format-specific
regressions; replace these placeholders by running the benchmark on a reference
machine and updating the entries with the real measured mean/p50 (use the
provided script node tests/benchmarks/mounting/compare.js --update-baseline) so
each "bru-*" and "yml-*" has accurate numbers, or if you cannot run benchmarks
yet add a top-level "status":"placeholder" flag in the JSON and update CI to
fail baseline comparisons when status is "placeholder" to prevent merging until
real values are recorded.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/benchmarks.yml:
- Around line 61-73: The "Comment Benchmark Results on PR" workflow step (uses:
actions/github-script@v7 and calling run({ resultsPath:
'tests/benchmarks/results/mounting.json', baselinePath:
'tests/benchmarks/mounting/baseline.json', title: 'Benchmark Results —
Collection Mount' })) can fail with 403 for forked PRs; update that step to
include continue-on-error: true to avoid failing the whole job on permission
errors (alternatively replace the event with a secure
pull_request_target/workflow_run pattern if you need to support writing comments
from privileged runs).

In `@tests/benchmarks/utils/collection-generator.ts`:
- Around line 46-47: The ymlContent produced by stringifyCollection may be
null/undefined and is written directly via fs.writeFileSync, so add a fallback
like the existing one for the 'bru' output: compute ymlContent =
stringifyCollection(...) || `meta: { name: "${name}", type: "collection",
opencollection: "1.0.0" }` (or a minimal valid YAML string), then pass that safe
ymlContent to fs.writeFileSync; update references to stringifyCollection and
ymlContent accordingly so opencollection.yml never contains "undefined"/"null".

In `@tests/benchmarks/utils/compare.js`:
- Line 21: The file uses ESM import syntax (import { existsSync, readFileSync,
writeFileSync } from 'fs') inside tests/benchmarks/utils/compare.js which causes
a SyntaxError under default CommonJS Node; either rename compare.js →
compare.mjs and update .github/workflows/benchmarks.yml to invoke the .mjs path
(Option A), or convert the file to CommonJS by replacing the import with const {
existsSync, readFileSync, writeFileSync } = require('fs') and ensure any other
ESM syntax is changed accordingly (Option B); pick one option and make the
matching change to the CI workflow if you rename the file.

In `@tests/benchmarks/utils/pr-comment.js`:
- Around line 77-78: The code reads baselinePath without checking existence,
causing an uncaught ENOENT; wrap the baseline read in the same guard/handling
used for resultsPath — either check fs.existsSync(baselinePath) before
JSON.parse or perform the fs.readFileSync/JSON.parse inside a try/catch, and on
error emit a clear log/error message and exit or return; update the lines that
call JSON.parse(fs.readFileSync(baselinePath, 'utf-8')) (and any surrounding
logic using baseline) to use this existence check or try/catch so missing or
invalid baseline files produce a helpful message instead of an unhandled
exception.
- Around line 46-52: The current call using github.rest.issues.listComments(...)
and then searching comments.find(...) can miss the existing marker on PRs with
>100 comments; update the call to either include per_page: 100 and handle
pagination, or replace the listComments call with
github.paginate(github.rest.issues.listComments, { owner: context.repo.owner,
repo: context.repo.repo, issue_number: context.issue.number, per_page: 100 })
and then search the returned array for the marker (remove the .data unwrap since
paginate returns items directly); adjust the variable currently named comments
and the comments.find(...) usage accordingly so duplicate benchmark comments are
not created.

---

Nitpick comments:
In `@tests/benchmarks/mounting/baseline.json`:
- Around line 1-46: The baseline file currently uses placeholder identical
values for all entries (see entries "bru-50"..."bru-5000" and
"yml-50"..."yml-5000"), which masks format-specific regressions; replace these
placeholders by running the benchmark on a reference machine and updating the
entries with the real measured mean/p50 (use the provided script node
tests/benchmarks/mounting/compare.js --update-baseline) so each "bru-*" and
"yml-*" has accurate numbers, or if you cannot run benchmarks yet add a
top-level "status":"placeholder" flag in the JSON and update CI to fail baseline
comparisons when status is "placeholder" to prevent merging until real values
are recorded.

In `@tests/benchmarks/mounting/collection-mount.bench.ts`:
- Line 81: Replace the non-deterministic sleep (page.waitForTimeout(500)) in the
collection-mount benchmark with a locator-based expect or a justified comment:
after calling closeAllCollections() wait for a concrete UI change (e.g.,
expect(sidebarCollectionList).toBeEmpty() or
expect(collectionItem.locator('text=...')).not.toBeVisible()) that signals
completion; if there truly is no stable DOM change to await, keep a very brief
comment explaining why a fixed timeout is required and reduce the timeout to the
minimum necessary while noting the reason.
- Around line 70-95: The test body for the `test("mount ${format} collection
with ${size} requests", ...)` case should be split into explicit Playwright
`test.step` calls to improve report clarity: wrap the collection generation loop
(calls to `generateCollection` and `createTmpDir`) in a "generate collections"
step, wrap each measurement call to `measureCollectionMount` (and the
per-iteration wait) inside a "measure mount (iteration i)" step or a single
"measure mounts" step, and wrap result aggregation/annotation (uses of
`timings`, `resultKey`, `summarize`, and `test.info().annotations.push`) and the
final console log into a "record results" step; update the test body (the
anonymous async function passed to `test(...)`) to call `await test.step(name,
async () => { ... })` around those phases so failures show as separate steps
while preserving existing logic and variables.

In `@tests/benchmarks/utils/compare.js`:
- Line 62: Replace the undocumented fallback that uses baseline.collections with
a strict requirement for baseline.entries: change the initialization of
baselineEntries (the variable named baselineEntries) to only read
baseline.entries and add a guard that throws (or returns a clear error) when
baseline.entries is missing so the function fails loudly instead of silently
using a dead-path; this ensures the code path in compare.js validates the
expected schema rather than swallowing a missing entries key.

In `@tests/benchmarks/utils/pr-comment.js`:
- Around line 29-33: pct is being set to a string via .toFixed(1) and then
compared numerically; change the code so pct is a numeric value used for
comparisons (compute pct = (data.mean - base.mean) / base.mean * 100 without
.toFixed), perform numeric comparisons against threshold and -threshold
(affecting the checks using pct in the hasRegression assignment and status
calculation), and only format pct with toFixed(1) when building the output
interpolation for body (e.g., use a formattedPct variable for the string).
Ensure the same named symbols are updated: pct (numeric), hasRegression, status,
and the body interpolation where `${pct}` is inserted.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b03d3316-2bd5-4d22-9187-5e9061d11d3b

📥 Commits

Reviewing files that changed from the base of the PR and between d332d8e and c536810.

📒 Files selected for processing (13)

.github/actions/tests/run-benchmark-tests/action.yml
.github/workflows/benchmarks.yml
.gitignore
package.json
playwright.benchmark.config.ts
playwright.config.ts
tests/benchmarks/mounting/baseline.json
tests/benchmarks/mounting/collection-mount.bench.ts
tests/benchmarks/utils/collection-generator.ts
tests/benchmarks/utils/compare.js
tests/benchmarks/utils/pr-comment.js
tests/benchmarks/utils/results.ts
tests/benchmarks/utils/stats.ts

coderabbitai

♻️ Duplicate comments (1)

.github/workflows/benchmarks.yml (1)

75-87: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Prevent fork PR permission errors from failing the benchmark job

Line 75–87 can still fail on fork-originated pull_request runs when PR comment write is blocked. Make the comment step non-blocking so benchmark execution/artifacts remain available.

Suggested patch

       - name: Comment Benchmark Results on PR
         if: github.event_name == 'pull_request' && !cancelled() && matrix.os-name == 'ubuntu'
+        continue-on-error: true
         uses: actions/github-script@v7
         with:
           script: |
             const run = require('./tests/benchmarks/utils/pr-comment.js');
             await run({

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/benchmarks.yml around lines 75 - 87, The "Comment
Benchmark Results on PR" step currently calls the pr-comment script and can fail
on forked PRs when comment permissions are blocked; make this step non-blocking
by adding continue-on-error: true to that step (the step with name "Comment
Benchmark Results on PR" that uses actions/github-script@v7 and invokes the
run(...) call) so failures to post the comment won't fail the whole job and
artifacts/results remain available.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In @.github/workflows/benchmarks.yml:
- Around line 75-87: The "Comment Benchmark Results on PR" step currently calls
the pr-comment script and can fail on forked PRs when comment permissions are
blocked; make this step non-blocking by adding continue-on-error: true to that
step (the step with name "Comment Benchmark Results on PR" that uses
actions/github-script@v7 and invokes the run(...) call) so failures to post the
comment won't fail the whole job and artifacts/results remain available.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c9bd73f9-89e6-4048-a88a-9f2e6dad5879

📥 Commits

Reviewing files that changed from the base of the PR and between ae9c9e8 and 80205e9.

📒 Files selected for processing (3)

.github/workflows/benchmarks.yml
tests/benchmarks/mounting/baseline.json
tests/benchmarks/mounting/collection-mount.bench.ts

✅ Files skipped from review due to trivial changes (1)

tests/benchmarks/mounting/baseline.json

🚧 Files skipped from review as they are similar to previous changes (1)

tests/benchmarks/mounting/collection-mount.bench.ts

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/benchmarks.yml:
- Around line 67-73: Replace the per-matrix push in the "Commit Updated
Baseline" step with an artifacts workflow: in the matrix job (the step keyed by
name "Commit Updated Baseline" that currently checks
github.event.inputs.update-baseline == 'true' and references ${ { matrix.os-name
} }), stop running git commit/push and instead upload the updated baseline file
as an artifact named with the OS (use actions/upload-artifact, upload
tests/benchmarks/mounting/baseline.${{ matrix.os-name }}.json). Add a single
downstream job (e.g., "merge-baseline-updates") that runs once when
github.event.inputs.update-baseline == 'true', depends on the matrix job(s) via
needs, downloads all per-OS artifacts with actions/download-artifact, places
them into tests/benchmarks/mounting/, then performs git config, a single git
add/commit -m "chore: update baseline" and git push so one atomic commit handles
all baseline files and avoids concurrent non-fast-forward failures.
- Around line 67-73: The commit step named "Commit Updated Baseline" currently
uses the chained shell expression `git diff --staged --quiet || git commit -m
... && git push`, which can run `git push` even when no commit was created;
replace this operator chaining with an explicit conditional that checks the
staged-diff result and only runs the commit-and-push sequence when changes exist
(i.e., test `git diff --staged --quiet` and, if it indicates changes, run `git
commit -m "chore: update ${{ matrix.os-name }} benchmark baseline"` followed by
`git push`), ensuring push is executed only after a successful commit.

In `@tests/benchmarks/mounting/baseline.windows.json`:
- Around line 3-43: The baseline JSON "entries" currently only includes sizes up
to "bru-3000" and "yml-3000", but the benchmark coverage should span 50–5000;
run the mount benchmarks for the 5000-size collections and add new baseline
records named "bru-5000" and "yml-5000" to the entries object (including mean
and p50 stats) so regression detection covers the largest collection size.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0c3f77dd-22e8-4b5e-bcbc-78927b8e9bd9

📥 Commits

Reviewing files that changed from the base of the PR and between 8e14991 and 33ae801.

📒 Files selected for processing (5)

.github/actions/tests/run-benchmark-tests/action.yml
.github/workflows/benchmarks.yml
tests/benchmarks/mounting/baseline.macos.json
tests/benchmarks/mounting/baseline.ubuntu.json
tests/benchmarks/mounting/baseline.windows.json

✅ Files skipped from review due to trivial changes (2)

tests/benchmarks/mounting/baseline.macos.json
tests/benchmarks/mounting/baseline.ubuntu.json

sid-bruno

Things to keep in mind for the next set of tasks

The action's PR comment might fail, please bring it to my notice once merged so that I can update the token to have the needed perms to do that.
~~A benchmark will need more than 1 run for each collection set. The mean, median, p95, p99 right now just based on 1 run per collection variant and that's not exactly apt for a fair benchmark~~ , there's 3 iterations being used for the compute.

- Playwright benchmark tests measuring collection mount time across bru/yml formats and sizes (50-5000 requests) - IPC listener approach for precise mount-complete signal - Generic benchmark utils: stats, results I/O, baseline comparison, PR commenting - Collection generator using @usebruno/filestore serializers - CI workflow running on ubuntu, macos, and windows with PR comment reporting - Regression detection against committed baselines with configurable threshold

…fix review issues - Same collection mounted/unmounted across iterations for cold vs cached comparison - workflow_dispatch has update-baseline boolean input for manual baseline updates - Fix string comparison bug in pr-comment.js (pct was string from toFixed) - Remove dead baseline.collections fallback in compare.js and pr-comment.js - Remove unnecessary waitForTimeout between iterations - Rename pct/pctChange to changePercent/percentChange for readability

…mmit on update-baseline - Reduce max collection size from 5000 to 3000 to keep CI runtime reasonable - Update baseline values from actual CI run data (worst case across ubuntu/macos/windows) - Auto-commit updated baseline.json when update-baseline is triggered via workflow_dispatch - Reuse same collection across iterations for cold vs cached comparison - Fix string comparison bug and remove dead code from review feedback - Rename pct variables to changePercent for readability - Remove unnecessary waitForTimeout between iterations

- Add continue-on-error to PR comment step since GITHUB_TOKEN lacks write access on cross-fork PRs

- Split baseline.json into baseline.ubuntu/macos/windows.json with real CI data - Action and workflow dynamically reference baseline per OS - PR comment posted per OS with OS-specific comparison - Auto-commit updated baseline on workflow_dispatch with update-baseline flag

…ults - writeResults now accepts SuiteMeta with name, unit, and direction - Results JSON includes suite field for the visualization dashboard to ingest - Mounting benchmark outputs unit: ms, direction: smaller

chirag-bruno requested review from bijin-bruno, helloanoop, lohit-bruno, naman-bruno and sid-bruno as code owners May 5, 2026 14:10

pull-request-size Bot added the size/XL label May 5, 2026

coderabbitai Bot reviewed May 5, 2026

View reviewed changes

Comment thread .github/workflows/benchmarks.yml

Comment thread tests/benchmarks/utils/collection-generator.ts

Comment thread tests/benchmarks/utils/compare.js

Comment thread tests/benchmarks/utils/pr-comment.js

Comment thread tests/benchmarks/utils/pr-comment.js

coderabbitai Bot reviewed May 5, 2026

View reviewed changes

Comment thread .github/workflows/benchmarks.yml

Comment thread tests/benchmarks/mounting/baseline.windows.json

chirag-bruno force-pushed the feat/benchmark-collection-mount branch from 6dbaaf0 to 69e92bc Compare May 11, 2026 09:40

sid-bruno approved these changes May 11, 2026

View reviewed changes

chirag-bruno force-pushed the feat/benchmark-collection-mount branch 2 times, most recently from f862b30 to 30ef913 Compare May 13, 2026 19:09

cchirag added 7 commits May 15, 2026 15:27

fix: handle PR comment permission error on fork PRs

4325f21

- Add continue-on-error to PR comment step since GITHUB_TOKEN lacks write access on cross-fork PRs

feat: include suite metadata (name, unit, direction) in benchmark res…

791d528

…ults - writeResults now accepts SuiteMeta with name, unit, and direction - Results JSON includes suite field for the visualization dashboard to ingest - Mounting benchmark outputs unit: ms, direction: smaller

feat: extract timing helpers, capture raw float ms in mount benchmark

29668da

chirag-bruno force-pushed the feat/benchmark-collection-mount branch from 30ef913 to 29668da Compare May 15, 2026 09:58

sid-bruno approved these changes May 18, 2026

View reviewed changes

sid-bruno merged commit 736c050 into usebruno:main May 18, 2026
20 of 21 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add benchmark framework for collection mount performance#7915

feat: add benchmark framework for collection mount performance#7915
sid-bruno merged 7 commits into
usebruno:mainfrom
chirag-bruno:feat/benchmark-collection-mount

chirag-bruno commented May 5, 2026 •

edited by sid-bruno

Loading

Uh oh!

coderabbitai Bot commented May 5, 2026 •

edited

Loading

Reviews paused

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

sid-bruno left a comment •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

chirag-bruno commented May 5, 2026 • edited by sid-bruno Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Contribution Checklist:

Publishing to New Package Managers

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sid-bruno left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chirag-bruno commented May 5, 2026 •

edited by sid-bruno

Loading

coderabbitai Bot commented May 5, 2026 •

edited

Loading

sid-bruno left a comment •

edited

Loading